Skip to content

[live-migration] adds the paths for enabling save/restore#2709

Open
rawahars wants to merge 1 commit into
microsoft:mainfrom
rawahars:lm_gcs_methods
Open

[live-migration] adds the paths for enabling save/restore#2709
rawahars wants to merge 1 commit into
microsoft:mainfrom
rawahars:lm_gcs_methods

Conversation

@rawahars

@rawahars rawahars commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduces the host-side primitives needed to snapshot in-flight container state on the source and re-attach to it on the destination, without disturbing the existing create paths.

  • cow: add MigrationState struct (stdin/stdout/stderr vsock ports + outstanding WaitForProcess bridge call id) and a Process.MigrationState() accessor used by the save path. Stubbed (zero value) on hcs.Process and jobcontainers.JobProcess, since neither uses vsock or a GCS bridge.

  • internal/gcs:

    • Process records the stdio vsock ports allocated by gc.exec and exposes them (along with the waitCall id) via MigrationState. Close tolerates nil io channels so streams not opened on the restore path don't panic.
    • Process.ExitCode tolerates an hrNotFound result from the WaitForProcess RPC (the guest may have already reaped the process by the time the restored host re-subscribes); Process.Wait now routes through ExitCode so it inherits the same tolerance.
    • Rename CloneContainer -> OpenContainer as the generic "attach to an already-running container" entry point.
    • Add Container.OpenProcessWithIO: restore-side counterpart of CreateProcess that re-listens on the supplied vsock ports and re-subscribes to the process exit notification.
    • Add GuestConnection.NextPort / SetNextPort to snapshot and seed the IO port allocator floor so restored processes don't collide with newly-allocated ones.
  • internal/cmd: add cmd.Attach, the destination-side counterpart of Command/CommandContext that binds a Cmd to a caller-resolved process and wires the IO relays. Relay wiring factored out of Start into startRelay. Tests cover Attach lifetime and IO flow.

  • internal/guest/bridge: reset Bridge.protVer to PvInvalid at the top of ListenAndServe so a fresh NegotiateProtocol after a reconnect dispatches to the PvInvalid-registered handler instead of falling through to UnknownMessageHandler. Covered by new TestBridge_ListenAndServeResetsProtocolVersion.

  • internal/vm/guestmanager: add Guest.OpenContainer, NextPort and SetNextPort wrappers over the underlying GCS connection.

  • internal/vm/vmmanager:

    • Add UtilityVM.PropertiesV3 exposing the V3 HCS property query for the save path.
    • New migration.go adds the migration lifecycle wrappers on UtilityVM: StartWithMigrationOptions (destination start), InitializeLiveMigrationOnSource, StartLiveMigrationOnSource, StartLiveMigrationTransfer, FinalizeLiveMigration, and MigrationNotifications for receiving migration event payloads.
  • pkg/migration: add parse.go with protobuf -> HCS schema converters for migration initialization options (memory transport, throttle parameters, compression settings).

  • Test and mock updates across cow, cmd, gcs, hcs, jobcontainers, and bridge packages (including regenerated internal/controller/process/mocks/mock_cow.go) to satisfy the new MigrationState contract and exercise Attach / bridge reconnect behavior.

@rawahars rawahars requested a review from a team as a code owner April 27, 2026 19:17
@rawahars rawahars force-pushed the lm_gcs_methods branch 3 times, most recently from abe57f8 to 62083de Compare April 28, 2026 06:33
@rawahars rawahars force-pushed the lm_gcs_methods branch 2 times, most recently from b63c543 to 8b37be7 Compare June 9, 2026 20:08
Host-side primitives to snapshot in-flight container state on the
source and re-attach to it on the destination, without disturbing
the existing create paths.

- cow: add MigrationState (vsock stdio ports + WaitForProcess call
  id) and Process.MigrationState() accessor used by the save path.
  Stubbed (zero value) on hcs.Process and jobcontainers.JobProcess.

- gcs:
  - Process records the stdio vsock ports allocated by gc.exec and
    exposes them via MigrationState; Close tolerates nil io channels
    for streams not opened on restore.
  - ExitCode tolerates hrNotFound from WaitForProcess (guest may
    have reaped the process before the restored host re-subscribes);
    Wait now routes through ExitCode.
  - Rename CloneContainer -> OpenContainer as the generic "attach to
    an already-running container" entry point.
  - Add Container.OpenProcessWithIO: restore counterpart of
    CreateProcess that re-listens on supplied vsock ports and
    re-subscribes to the exit notification.
  - Add GuestConnection.NextPort / SetNextPort to snapshot and seed
    the IO port allocator floor so restored processes don't collide
    with newly-allocated ones.

- cmd: add Attach, the destination counterpart of Command /
  CommandContext that binds a Cmd to a caller-resolved process and
  wires the IO relays (factored out of Start into startRelay).

- guest/bridge: reset Bridge.protVer to PvInvalid in ListenAndServe
  so a fresh NegotiateProtocol after reconnect dispatches to the
  PvInvalid handler instead of UnknownMessageHandler.

- vm/guestmanager: add Guest.OpenContainer, NextPort and SetNextPort
  wrappers over the underlying GCS connection.

- vm/vmmanager: add UtilityVM.PropertiesV3 and migration.go with the
  migration lifecycle wrappers (StartWithMigrationOptions,
  Initialize/Start/Transfer/FinalizeLiveMigration,
  MigrationNotifications).

- pkg/migration: add parse.go with protobuf -> HCS schema converters
  for migration init options (memory transport, throttle params,
  compression settings).

- Test and mock updates (cmd, gcs, hcs, jobcontainers, bridge,
  controller/process mocks) for the new MigrationState contract,
  Attach, and bridge reconnect behavior.

Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant